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Abstract 

In this paper we present a new way to understand the timing of branch- 
ing events in phylogenetic trees. Our method explicitly considers the 
relative timing of diversification events between sister clades; as such it 
is complimentary to existing methods using lineages-through-time plots 
which consider diversification in aggregate. The method looks for evidence 
of diversification happening in lineage-specific "bursts", or the opposite, 
where diversification between two clades happens in an unusually regular- 
fashion. In order to be able to distinguish interesting events from stochas- 
ticity, we propose two classes of neutral models on trees with timing infor- 
mation and develop a statistical framework for testing these models. Our 
models substantially generalize both the coalescent with ancestral popu- 
lation size variation and the global-rate speciation-extinction models. We 
end the paper with several example applications: first, we show that the 
evolution of the Hepatitis C virus appears to proceed in a lineage-specific 
bursting fashion. Second, we analyze a large tree of ants, demonstrating 
that a period of elevated diversification rates does not appear to occurred 
in a bursting manner. 



Introduction 

Understanding the tempo and mode of diversification is one of the major 
challenges of evolutionary biology. Phylogenetic trees with timing information 
are powerful tools for answering questions about tempo and mode. Such trees 
were once available only in situations with a rich fossil record, where the timing 
information might have come from radiocarbon dating or stratigraphic infor- 
mation. However, modern techniques of phylogenetic analysis are capable of 
reconstructing not only the topology of phylogenetic trees, but can also recon- 
struct information about the timing of diversification events even when limited 
or no fossil evidence is available. This can be done in one of a numbe r of wa ys. 
One can first test if a molecular clock is appropriate [see iFelsensteinl (|2004l) p. 
323], then reconstruct under the assumption of a molecular clock. One can 
reconstruct a tree with bran ch lengths using any method and then apply rate 
smoothing (|Sandersonl 120031 ) . One may also choose from the variety of "relaxed 
clock" methods which allow the rate of substitution to vary wit hin the tree 
( Gill espie] . 1984 ; Huelsenbeck et al. . l200d iDrummond et all . [2006a). Of course, 
the accuracy of any these techniques depend on a correct choice of model and 
a strong phylogenetic signal along with perhaps some fossil calibration points. 

Phylogenetic trees with timing information can then be used to make infer- 
e nces about the f orces guiding the evolution of the taxa. For example, the paper 
of lMoreau et al. (2006) notes that there was a period of high dive rsification rate 
in ant lineages during the rise of angiosperms. Another paper bv lHarmon et aT 



(2003) uses the deviation of four groups of lizards from the pure-birth model of 
diversification to make inferences about their evolutionary radiations. 

Given the number of methods available for reconstructing phylogenetic trees 
with diversification timing information, and the interest in investigating tempo- 
ral properties of those trees, the number of direct methods available to investi- 
gate timing information on phylogenetic trees is surprisingly small. The most 



2 




Figure 1: A motivating example showing "bursting" diversification. Namely, in 
the oldest part of the tree, diversification events happen exclusively in the B 
lineage, followed by a period of high diversification rate in the A lineage. This 
paper constructs a statistical framework for analyzing such "bursting" patterns 
or their opposite. 



popular ways of investigating timing in phylogenetic trees a re lineage-through- 
time ( LTT) plots and the associated 7 statistic introd uced bv lPvbus and Harvev 
(2000) [for a helpful review article see (Ricklcfs, 2007)]. LTT plots have time t on 
the x axis and simply show the number of lineages which were present in the phy- 
logenetic tree at time t on the y axis. A constant rate pure-birth process would 
have the number of taxa increa sing exponentially; i t is common to compare LTT 
plots to an exponential curve ( Zink and SlowinskiL 1995 : Harmon et al. . 2003 ). 



The 7 statistic is computed based on the periods during which the LTT plot 
stays constant (called the "inter-node intervals"); the 7 for a pure-birth diver- 
sification process will have a standard normal distribution. Broadly speaking 
7 < implies that diversification rates were high early in history, while 7 > 
implies that most diversification has happened m ore recently. A similar stat istic 
with the same goals in mind was constructed bv lZink and Slowinskil (jl995l ). 

However, much more information is available in a phylogenetic tree with 
diversification timing information than can be summarized in a LTT plot or a 
derivative statistic. Consider the tree in Figure 1, with two sets of sister taxa, 
A and B. The taxa in B had a period of relatively high diversification rate 
early in evolutionary history, during which time the lineage leading to A is in a 
period of stasis. Then lineage A experiences a burst of diversification, and the 
taxa in B do not experience any lineage-splitting events during this time. We 
will call the sort of diversification seen in Figure 1 "lineage-specific bursting" 
(LSB) diversification, i.e. where the relative diversification rates in two sister 
clades vary over time. 

The lineage-specific bursting diversification seen in Figure 1 would not be 
apparent in an LTT plot. Indeed, LTT plots take the timing information out 
of the context of the phylogenetic tree from which from which they are derived, 
and thus ignore information about how the timings relate to topology of the 
tree. This context can be crucial, as we now argue. 

One would like to be able to say if, for example, the pattern seen in Figure 1 
arose simply "by chance." In order to do so, we need two things: first, a 
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convenient way to summarize the timing information, and second, a set of null 
models which define what we mean with "by chance." For a given internal node, 
we summarize the timing information at that node by writing down the order of 
diversification events by clade. For instance, we associate with the root node of 
Figure 1 the sequence s = BBBBBAAAAAAAB which we will call a "shuffle" 
in analogy to a shuffling of cards labeled A and B. We make a more formal 
definition of shuffles in the section labeled "Tree Shuffles." 

Now that we have summarized the timing information at the root node 
as a shuffle s, we would like to think about if s arose "by chance." This of 
course requires us to define a probability distribution on shuffles; we demonstrate 
below that a wide class of null models on phylogenetic trees give the uniform 
distribution on shuffles. The uniform distribution in this setting is what one 
would get by throwing the A's and .B's of the shuffle into a bag and drawing 
them out one by one uniformly. Thus it seems reasonably unlikely that the 
shuffle s would arise by chance, having first a long run of -B's then a long run 
of A's. 

We can attach a p-value to a shuffle by using the "runs distribution." The 
number of "runs" is simply the number of sequences of the same letter: in this 
case, there is a run of B's, then a run of A's, then another B. That totals three 
runs. Under the uniform distribution, the probability of seeing a given number 
of runs in this setting is known from classical statistics, and can be calculated 
via Equation (TT]). The probability of seeing 3 runs with 6 A's and 7 B's is 
about 0.00641, and the probability of seeing 2 runs is about 0.00117. We can 
interpret the sum of these two probabilities, 0.00758, as the significance level of 
the LSB diversification seen in Figure 1. Being below the 1% significance level, 
we can interpret this shuffle as being quite significant; thus if the tree in Figure 1 
came from data, the observed lineage-specific diversification might require some 
explanation. Please note that for simplicity this example only considers the 
root shuffle; however the main body of the paper is dedicated to investigating 
all shuffles simultaneously. 

The first aim of this paper is to provide analytical tools to compare diversifi- 
cation rates between lineages. In doing so, we hope to provide a complimentary 
perspective to that provided by LTT plots and associated statistics. In particu- 
lar, our method can detect "lineage-specific bursting" (LSB) diversification, i.e. 
where the diversification rates in two sister clades vary over time. One might ex- 
pect LSB diversification if a lineage diversifies to fill variants of a single niche, or 
if a key innovation appears which makes further diversifications more likely. By 
comparing the results of our analysis to results using LLT plots, we may be able 
to tease apart causes of diversification rate changes — are they lineage-specific 
or due to global events? 

The second aim of this paper is to investigate null models of phylogenetic 
trees with timing information. In contrast to the setting of phylogenet i c tree 
shape, where a num ber of models are available (|Aldoud . 1 19951 : iFordl . 120051: 



Mooers et al.l . 120071) . there are relatively few models available for trees with 



timing information. Null models are important as they allow us to distinguish 
between stochastic sampling and actual events which need investigation; they 
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are thus important tools for assessing significance. 

We conclude the paper with example applications. Our first example appli- 
cation uses Hepatitis C (HCV) data, and shows that trees from this data demon- 
strate a limited but significant amount of LSB diversification. This analysis may 
imply a note of caution for researchers using coalescen t methods to analyze HCV 
data. Our second application is to the ant data of ( Moreau et al. . 2006f) and 



( Moreaul lin review! ) . the lineages of which do not appear to demonstrate signif- 
icant LSB diversification, despite some other interesting characteristics of their 
history. 

Our paper is one contribution to the area of understanding mechanisms 
of diversification from phylogenetic trees. Besides lineages through time plots 
and 7, there is an entire literature on phylogenetic tree shape (which does no t 



include branch length); for an excellent review see iMooers and Heard! (|1997l ). 
There a re also a num ber of interestin g papers which use trait information, for 
exa mple|Pagel| l|l997h andEsi I^OOfih . However, our method is the first to use 



just a phylogenetic tree with branch lengths in a way which integrates both 
sources of information. 

Tree shuffles 

Our method is based on "ranked" phylogenetic trees: trees for which the 
order of branching events in the tree is specified in a way compatible with the 
topology ( more specific definition below). Such trees have been called "den- 
drograms" ( Page! . Il991 ). As we discuss below, a ranked phylogenetic tree is 



equivalent to a phylogenetic tree with a "shuffle" at each internal node speci- 
fying relative timing information. As described below, a broad class of neutral 
diversification models give the uniform distribution on shuffles, which leads to 
some natural tests for deviation from these models. Thus evidence of deviation 
from the uniform distribution on shuffles is evidence of deviation from this entire 
broad class of neutral models. (Note that by "model" we mean a forward time 
ranked-tree-valued stochastic process. Often, a model with branch lengths is 
given, in which case we consider the induced model given by considering ranks.) 

The intuition behind the shuffle idea is presented in Figure 2. As shown in 
this figure, the relative order of bifurcation events for an internal node of a tree is 
determined by the sequence of full and hollow circles on the left side of each tree. 
We call this sequence a "shuffle." Shuffles also have a natural interpretation 
in terms of evolutionary history. Namely, "bursting" diversification leads to 
symbols of a shuffle clustering together. The opposite situation, where there is a 
post-diversification delay before a lineage can diversify again, can be recognized 
by the interspersing of diffe rent symbols. This latt er situation has been called 
"refractory" diversification (iLosos and Adlerl . ll995l) . 



We now make more formal definitions of our terms. For the purposes of 
this paper, a phylogenetic tree is a rooted tree with distinct leaf labels. We 
will denote the set of interior nodes of a phylogenetic tree T with Nt- For an 
internal node v in Nt, define T v to be the rooted subtree of T containing all the 
descendants of v. The "daughter trees" of v are the two subtrees of T v which 
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Figure 2: A shuffle at a given internal node. Bifurcations on the left subtree are 
marked with a hollow circle, and those on the right subtree are marked with a 
solid circle. The relative timing for these events is shown beside the tree; we 
call this sequence of symbols a "shuffle." A set of shuffles for every internal 
node of a phylogenetic tree exactly determines the relative order of bifurcation 
events. Similar type symbols occurring together as in the left tree is evidence 
of lineage-specific bursts. 



we obtain by deleting v and its two incident edges. For the first part of the 
paper, we assume that the phylogenetic trees are bifurcating. We describe later 
how to generalize the ideas presented to the case of multifurcating trees. A rank 
function on an arbitrary set S is simply an ordering of the elements of that set; 
mathematically it is a one-to-one mapping from S to ranks {1,2, ...JS*!}. A 
rank function on a phylogenetic tree T is a rank function on the set of interior 
vertices Nj> with the property that the ranks are increasing on any path from 
the root to a leaf. We call a phylogenetic tree with a rank function a ranked 
phylogenetic tree or simply ranked tree demote and Steel to. 



In mathematics, a total order on a set is simply a binary relation (usually 
written <) such that for any two distinct elements a and b of the set either 
a < b or b < a. Note that a rank function on a set is equivalent to a total order 
on that set: given a total order one can rank the elements in increasing order 
of rank, and given a rank function one can define a total order by numerical 
inequality of rank. Thus a ranked phylogenetic tree is exactly a tree equipped 
with a total order on its internal nodes. 

In this paper, an (to, n) shuffle on symbols p and q is simply a sequence of 
length m + n containing to p's and n q's. (The compl ete terminology for such 
a sequence is riffle shuffle (jAldous and Diaconislll986h .) For example pqppq is 



a (3, 2) shuffle on p and q. The utility of these shuffles in the present context is 
summarized in the following observation. 

Observation 1. Given totally- ordered sets P and Q, the total orderings of PUQ 
respecting the given orderings of P and Q are in one-to-one correspondence with 
the (\P\, \Q\) shuffles on symbols p and q. 

To see how this works, assume total orders p\ < p2 < ■ ■ ■ < p m on P and 
qi < qi < • ■ ■ < Qn on Q are given, along with a (to, n) shuffle on p and q. The 
required total ordering on P U Q is obtained by progressing along the shuffle 
and substituting pi and qj for p and q in order: for example the shuffle pqppq 
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uniquely defines the total order pi < qi < P2 < P3 < 92 when pi < P2 < P3 and 
qi < <72- In the other direction, a total ordering on P U Q uniquely defines a 
(|P|, |Q|) shuffle and a total ordering on each of P and Q. 

We can use shuffles to develop a recursive formulation of ranked phylogenetic 
trees. Assume that v is an internal node of a tree and that the tree T v containing 
the descendants of v is composed of two daughter subtrees L v and R v . Assume 
L v and R v have m and n internal nodes, respectively. We define a "shuffle at an 
internal node" v to be a (m, n) shuffle on symbols I and r. Assume L v and R v 
are ranked subtrees, i.e. there is a total ordering on the internal nodes of each 
of L v and R v . By Observation [TJ a total ordering on the internal nodes of T v 
respecting the orderings on the internal nodes of L v and R v is equivalent to a 
shuffle at the internal node v. Therefore we can recursively reconstruct the rank 
function for any ranked tree given a shuffle at each internal node. We define 
a tree shuffle to be such a choice of shuffles. With Observation [1] we have the 
following result, which is crucial to our analysis: 

Observation 2. Each rank function on a given tree being equally likely is equiv- 
alent to the statement: For each internal node v, each shuffle at v is equally 
likely. 

Neutral models for ranked trees 

In this section we formulate two classes of neutral models. First we introduce 
the constant across lineages models, which are an obvious generalization of the 
coalescent/Yule models. Then we introduce the constant relative probability 
models, which generalize coalescent/Yule models in a new direction. The com- 
mon theme between these two classes of models is that they both induce the 
uniform distribution on shuffles. 

Ranked oriented trees 

In this section it will be convenient to discuss ranked oriented trees rather than 
ranked phylogenetic trees. This allows us to distinguish the children of each 
vertex without having to explicitly label species which may later become ex- 
tinct. The distributions and statistics considered in this paper may be easily 
transferred between these two types of trees, as well as ranked unlabeled trees, 
which we call ranked tree shapes. For the present purposes, definitions and proof 
are much easier in the case of ranked oriented trees. 

Definition 3. A oriented tree is a finite rooted binary tree where the children 
of each internal node are labeled left and right respectively. A ranked oriented 
tree is a oriented tree with a rank function. 

Note that here "tree" in this definition is not short for "phylogenetic tree." 
That is, our notion of ranked oriented tree does not include leaf labels. These 
trees are called "oriented" because they are oriented graphs, i.e. the edges 
around each vertex have a fixed orientation. 
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There is a map from ranked oriented trees to ranked tree shapes which forgets 
the ordering of children. Similarly, there is a map from ranked phylogenetic 
trees (called ranked trees in this context) to ranked tree shapes which forgets 
leaf labels. Both of these "forgetful" maps induce maps between probability 
distributions on these sets of trees. 

We now prove that the uniform distributions on ranked oriented trees and on 
ranked phylogenetic trees induce the same distribution on ranked tree shapes. 

Recall that at cherry is a pair of adjacent leaves: two leaves with the same 
parent. 

Proposition 4. Given a ranked tree shape with n leaves and k cherries, there 
are 2 n ~ 1 ~ k ranked oriented trees sent to it by the forgetful map. 

Proof. First, note that the n — 1 internal nodes of a given ranked tree shape 
are distinguished by their rank. Thus every ranked oriented tree which maps 
to this tree shape must be formed by assigning an orientation at each of these 
internal nodes. For each of these there are two possible left-right labclings of the 
child subtrees, giving 2™ _1 ranked oriented trees. However, for the k internal 
vertices which are the parents of a pair of leaves the ordering of children does 
not effect the resulting ranked oriented tree. For all other n — 1 — k internal 
vertices the ranking of internal vertices ensures that the two orderings of child 
subtrees are distinguishable. Thus there are exactly 2 n ~ 1 ~ k distinct ranked 
oriented trees. □ 

Proposition 5. Given a ranked tree shape with n leaves and k cherries, there 
are ^ ranked phylogentic trees sent to it by the forgetful map. 

Proof. Similarly to the previous proof, there arc n! ways to label the identified 
leaves of a ranked tree shape. However, the labels of the two leaves of a cherry 
may be switched without changing the ranked phylogentic tree. Such switches 
(and their combinations) are the only such transformations which leave the 
phylogenetic tree unchanged. Thus, there are ^ distinct leaf labelings of the 
ranked tree shape. □ 

Together, these give us the desired result: 

Proposition 6. A uniform distribution on ranked oriented trees with n leaves 
and a uniform distribution on ranked phylogenetic trees with n leaves both induce 
the same distribution on ranked tree shapes. 

Proof. Given a ranked tree shape with n leaves and k cherries, there are 2™~ 1 ~ fe = 
2 n ~ 1 /2 k ranked oriented trees which map to this tree and n\/2 k ranked phylo- 
genetic trees which map to this tree. Thus, for both of the induced probabilities 
on ranked tree shapes, the ratio between the probability of a tree with k cherries 
and the fixed tree with 1 cherry is l/2 fe_1 . As both distributions are probabilities 
they must be equal. □ 

Proposition 7. There are (n — 1)! ranked oriented trees on n leaves. 



8 



Proof. Proceed by induction on n: for n = 2 the statement is obviously true. 
Suppose there are (n — 1)! ranked oriented trees (ROTs) with n leaves. Now 
note that the next bifurcation event is uniquely identified by its rank, and it 
can occur in n places, thus there will be n\ ROTs with n + 1 leaves. □ 

It is well known that the Yule model gives the uniform distribution on ranked 
phylogenetic trees on n taxa for each n (jEdwardsl ll9T0l ) . By Proposition [6j this 
corresponds to the uniform distribution on ranked oriented trees. A direct proof 
goes as follows: 

Proposition 8. The Yule model results in the uniform distribution on ranked 
oriented trees with n leaves after n — 1 bifurcations. 

Proof. Given a ranked oriented tree with n leaves, there are n possible places 
for a bifurcation event to occur. These are all equally likely. By induction the 
(n — 1)! ranked oriented trees with n leaves were all equally likely, so the n! 
ranked oriented trees with n + 1 leaves are also all equally likely. □ 

The following lemma will be useful shortly. 

Lemma 9. Given a ranked oriented tree (ROT) with n leaves, there are n(n+l) 
ways to add an additional leaf. 

Proof. First, decide which rank the new internal node will have, from 1 (earliest) 
to n (latest). If the new internal node has rank k then there are k choices at that 
level for the edge to add it to, and then 2 choices for which side of this edge the 
new pendant leaf will sit. This gives a total of 2 Y17=i * = 2 n< -" 2 +1 - ) = n{n + 1) 
ways to insert the new leaf edge. □ 

In the sequel, we consider forward time ranked- oriented-tree- valued stochastic 
processes. In particular we consider birth-death processes where the transitions 
between trees involve either an bifurcation or deletion event (e.g. speciation or 
extinction). Call these ranked- oriented-tree birth-death processes. The details 
of bifurcation (birth) and extinction (death) events are as follows. If there is a 
bifurcation event, in which two pendant leaves are attached to an existing leaf, 
the new branches descending from the bifurcation event are assigned left and 
right. If there is an extinction event, occurring at a leaf vertex, the leaf and its 
adjacent edge are deleted. The ancestor of the extinct leaf is now a degree- two 
vertex. This vertex is suppressed by replacing it and its two adjacent edges 
by a single edge, with an orientation for the new edge inherited in the obvious 
way. In this way, the ancestor still has a left and right child. The ranking of 
internal vertices is induced by the time ordering of their associated bifurcation 
events. If extinction events do not occur then call such a process a pure-birth 
ranked- oriented-tree process. 

In the later section on example applications we consider, for computational 
convenience, the likelihood of rank functions on tree shapes rather than on 
oriented trees. The following proposition and corollary show that the uniform 
distributions on rank functions of oriented trees in the models and analysis 
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which follow induce uniform distributions on tree shapes when orientation is 
forgotten. In particular, p- values for such rank functions may be computed over 
either oriented or unoriented tree shapes. 

First, define a symmetric vertex to be one for which the unoriented shapes 
of the subtree below each child of this vertex are the same (isomorphic as tree 
shapes). A big symmetric vertex is a symmetric vertex with more than two 
leaves below. 

Proposition 10. A uniform distribution on rank functions on a given oriented 
tree induces a uniform distribution on rank functions of its corresponding tree 
shape. 

Proof. Let t be an oriented tree with n leaves and t 1 its corresponding tree 
shape. Let q denote the number of big symmetric vertices of t. For n = 2, 
which implies q = 0, we have for the ranking on t' exactly 1 = 2° ranking on 
t. Assume the following statement is true for all oriented trees with less than 
n leaves: for each ranking on t' there are exactly 2 q rankings on t which are 
sent to it by the map which forgets orientation of vertices. The induction now 
breaks into three cases. 

Case 1: Suppose the two children of the root branch-point of t' are non- 
isomorphic tree shapes, each having more than 1 leaf. They may therefore be 
distinguished from each other, and given a ranking on t' the shuffle at the root 
node of t is determined. Call the two child subtrees "left" and "right" with q\ 
and qi big symmetric vertices, respectively. By the inductive assumption, there 
are 2 qi rankings for the left subtree of t and 2 92 rankings for the right subtree 
which map to the corresponding rank function on the left and right subtree 
shapes. This gives 2 qi+q2 total since there is no choice for the shuffle at the root 
branch-point of t. This is the number of big symmetric vertices of t' . 

Case 2: Suppose the two children of the root branch-point of t' are non- 
isomorphic tree shapes, one of the children being a leaf. They may therefore be 
distinguished from each other, and given a ranking on t' the shuffle at the root 
node of t is determined. The bigger subtree tb has qb big symmetry vertices. 
By the inductive assumption, there are 2 qb rankings for tb which map to the 
corresponding rank function on its shape. Attaching a leaf to tb to obtain t does 
not change the number of rankings for t or t'. Therefore, there are 2 qb rank 
functions on t which map to the given rank function on t', and qb is the number 
of big symmetric vertices of t' . 

Case 3: Suppose that the two children of t' are isomorphic. Therefore they 
may not be distinguished except by the ranking. Therefore the shuffle at the 
root branch-point of t is only determined up to swapping the left and right 
subtrees. After this choice the two subtrees are distinguished: which subtree 
of t' is "left" and which is "right" is determined by the shuffle. The rest of 
the argument proceeds as before, except that this time there are 2 qi+q2+1 rank 
functions on t which map to the given rank function on t', and qi + qi + 1 is the 
number of big symmetric branch-point of t' . 

The result now follows by induction. □ 
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Corollary 11. If a probability function on ranked oriented trees is uniform 
on rank functions conditioned on oriented tree then it is also uniform on rank 
functions of an (unoriented) tree shape when conditioned on that (unoriented) 
tree shape. 

Proof. This follows from the previous proposition because the resulting mixture 
of uniform distributions on rank functions on t' (one for each oriented tree t 
with shape t') is also uniform. □ 

This corollary allows us to apply our rank tests to trees which are given 
without orientation- ranked tree shapes. 

Constant across lineages (CAL) models 

We define a constant across lineages (CAL) model to be a forward time ranked- 
oriented-tree birth-death process such that any new (bifurcation or extinction) 
event is equally likely to occur in any extant lineage. You may also think of 
the projection of this process onto ranked trees, by forgetting the orientation of 
children at each internal vertex. Any model described in terms of rates is a CAL 
model if the bifurcation and extinction rates are equal between lineages at any 
given time. However, these rates may vary in an arbitrary fashion depending on 
time or the curren t state of the process. This class of mo dels includes the Yule 



mode l (jYuld . Il924j ). the critical branching process model (Aldous a nd Popovic . 



coalescent (jKingmanl . |1982) 



20051 ). the constant rate b irth and death process ( Nee et all " 1994 ) and the 



However, the CAL class is more general. It includes macroevolutionary 
models that have global speciation and extinction rate variation, for example 
due to global environmental conditions. Furthermore, it is also possible to 
incorporate models which take into account incomplete random taxon sampling, 
which is equivalent to the deletion of k species uniformly at random from the 
complete tree. Indeed, if the complete tree evolved under a CAL model then 
we simply run the model for longer with the probability of bifurcation set to 
zero and the extinction probability non-zero (and uniform across taxa). This 
extended model is clearly still within the CAL class. 

The CAL class also includes microevolutionary models such as the coalescent 
with arbitrary population size history. This very simple but important fact 
means that the tests for non-neutral diversification described in later sections 
are not fooled by ancestral population size variation (as are a number of other 
tests in the literature). 

The CAL definition is a generalization of the "exchangeable" criterion from 
Aldousl (l200lh . and we acknowledge the importance of Aldous' ideas in formu- 



lating the definition. 

Proposition 12. At all times in a CAL model, the distribution of ranked ori- 
ented trees with n leaves is uniform. 

Proof. Assume that after k events, all (m — 1)! ranked oriented trees (ROTs) 
of size m are equally likely. If the next event is a bifurcation then, because the 
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result of each (tree, bifurcation event) pair is distinct, after this event all m! 
ROTs with m + 1 leaves are equally likely. Similarly, if the next event is an 
extinction then for each of the (m — 1)! equally likely trees there are m equally 
likely choices for which leaf to extinguish, giving ml possibilities in all. By 
Lemma IH1 each ROT with m — 1 leaves results from m(m — 1) of these tree-plus- 
leaf choices. Thus each ROT with m — 1 leaves is equally likely, with probability 
m(m — 1) * l/m\ — l/(m — 2)!. 

Since this is true for any such sequence of bifurcations and extinctions it is 
true at all times. □ 

Recall that this corresponds to a uniform distribution on ranked phylogenetic 
trees, by Proposition [6] 

Of course, any model giving the uniform distribution on ranked trees with n 
tips gives the uniform distribution on rank assignments given a topology with 
n tips. Thus 

Corollary 13. Any CAL model gives the uniform distribution on rank assign- 
ments (and thus tree shuffles) given a tree topology. 

We have the following limited converse of Proposition [T2l 

Proposition 14. Pure-birth CAL models are the precisely the set of pure-birth 
ranked- oriented-tree processes which, for any n > 1, give the uniform distribu- 
tion on ranked oriented trees with n taxa when halted as soon as n taxa are 
present. 

Proof. By the proof of Proposition [T21 pure-birth CAL models result in a uni- 
form distribution on ranked oriented trees of size n (since there have been exactly 
n — 1 events). 

Now consider a model which does not satisfy the CAL condition. Assume 
that the fc-th bifurcation event was not picked uniformly among lineages, i.e. 
there is a ranked tree To with lineages l\ and I2 which have probabilities p\ 7^ P2 
to speciate. Let T\ (respectively T2) be the ranked tree produced if l\ (respec- 
tively I2) bifurcates. In a pure birth process, T\ and T2 may only be reached in 
this way. Now 

P[Ti] = P[T ] • pi + P[T ] • V 2 = P[T a ] 

which shows that this model cannot give the uniform distribution on ranked 
trees when the process is halted at k taxa. There is only one way to build 
each ranked oriented tree with n leaves so the distribution on these cannot be 
uniform, since an equal number must descend from each of T\ and T%. Thus, 
by contradiction, there is no such k and so no such model. □ 

Note that in the last proposition, the restriction to a pure-birth process is 
needed. Consider a process with extinction where bifurcation is equally likely for 
each species but extinction is history dependent: whenever an extinction event 
occurs, it undoes the most recent bifurcation event. This model clearly does not 
belong to the class of CAL models. However, it gives a uniform distribution on 
ranked trees of some fixed size. 
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Constant relative probability (CRP) models 

The motivation for the constant relative probability (CRP) models comes from 
considering the models on ranked trees which might emerge from non-selective 
diversification, perhaps based on physical or reproductive barriers. For example, 
assume we could watch a set of species emerge via allopatric speciation, and 
the fundamental geographic barrier is a mountain range dividing land into two 
regions, A and B. These regions may differ in size or fecundity, so there may 
be some difference in the rate of diversification in A versus B. However, our 
neutral assumption for the CRP class is that the relative rate stays constant over 
time. In contrast, non-neutral models might dictate that a bifurcation in one 
region will shift the equilibrium such that further diversification in that region 
will become more likely ("bursting" diversification) or less likely ("refractory" 
diversification) . 

Again, for convenience, we work with ranked oriented trees so we may dis- 
tinguish the two children of any bifurcation event. For each internal node, v, 
(representing a bifurcation event) let L v and R v denote the "left" and "right" 
lineages descending from v (daughter subtrees of v). 

A constant relative probability (CRP) model is a forward time pure-birth 
ranked-oriented-tree process together with a probability distribution P on the 
unit interval [0, 1], where each internal vertex has a real number, p v associated 
with it. Each new bifurcation occurring in the clade below v occurs in L v 
with probability p v , and occurs in R v with probability 1 — p v . For each new 
bifurcation event (internal vertex), v, choose the value p v by an independent 
draw from P. As with CAL models, there is no constraint of any kind on waiting 
times between bifurcation events. 

Recall the map from ranked oriented trees to unranked oriented trees which 
forgets the rank ordering of internal nodes and the leaf labels. The image of a 
ranked tree under this map is its oriented tree. Similarly, if the orientation is 
also forgotten then call the image the tree shape of the initial tree. 

Proposition 15. A CRP model, stopped at a time depending only on the time 
and number of leaves, gives the uniform distribution on rank functions for each 
oriented tree. 

Proof. Consider the distribution of ranked oriented trees resulting from the 
stopped CRP. Consider a particular oriented tree, t, with k internal vertices 
v\,...,Vk- Let rii and mi denote the number of internal vertices below the 
left and right subtrees, respectively, of vertex Vi. Fix a ranking on this tree. 
We now compute the probability of this ranked oriented tree under the model 
(conditional on the total number of leaves). Fix an assignment of p Vi to each 
internal vertex Vi. Given this choice, the probability of the given ranked tree is 
the product of the probabilities of each bifurcation event. For a bifurcation at 
vertex Vi , the probability of this event is the product of p v for all vj for which 
Vi lies on its left subtree times the product of (1 — p v ) for all Vj for which Vi lies 
on its right subtree. In the product of these probabilities over all m, the term 
p Vj occurs exactly rij times (once for each internal vertex on the left subtree of 



13 



Vj) and the term (1 —p Vj ) occurs exactly rrij times (once for each internal vertex 
on the right subtree of Vj). Thus, the probability of this ranked tree (given the 
choice of p v ) is: 

fe 

3=1 

Note that this is independent of the ranking. Since the p Vi are picked indepen- 
dently from a distribution P, the probability of this ranked tree shape is 



/ • • / H/^:l /'•• )"' dP ■ ■ ■ dP 



"* 3=1 

which is again independent of the ranking. Therefore, all rankings of this ori- 
ented tree are equally likely. □ 



Note that the CRP generalizes the stick-breaking models ( Aldousl . 1995 ) 



Recall that with the stick-breaking model, a stick is recursively broken into 
pieces, with the break point of each piece chosen independently from a prob- 
ability P on the open unit interval (0,1). For example, if the number chosen 
for a piece was 1/2 then that piece is broken into two equal sized pieces. For 
each piece a new draw is taken from P to determine the how far along to break 
it. To generate a finite oriented binary tree with n leaves, first break a stick as 
just described then choose n points from the unit interval uniformly at random. 
This determines a consistent set of partitions corresponding to a binary tree. 

It is well known that the Yule model is generated by setting P to the uniform 
measure on (0, 1). Similarly, Aldous's Beta model corresponds to setting P to 
be a beta measure. 

Thus, the CRP process produces oriented ranked versions of such trees in a 
sequential growth process. 



Tests for bursting diversification based on shuffles 

In the previous section, we demonstrated that any model satisfying the con- 
stant across lineages or constant relative probability criteria induces the uniform 
distribution on tree shuffles. In this section we describe a way of testing for devi- 
ation from the uniform distribution on tree shuffles, and thus test for deviation 
from these neutral models. We emphasize that this can go significantly beyond 
testing the coalescent/Yule model, which is typically considered to be the def- 
inition of neutrality. Indeed, rejection of the uniform distribution on shuffles 
rejects all of the CAL and CRP models simultaneously, and the coalescent/Yule 
model is only one model in these classes. We note further that although the 
focus of this section is to consider all of the shuffles of a ranked tree at once, one 
can also consider a shuffle at a particular node as described in the introduction. 

There are several useful tools available to test whether a shuffle is likely to 
have come from the uniform distribution on shuffles. In fact, a number of tests in 
the statistics literature have been developed for testing equality of distributions 
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which actually implement a test of deviation from the uniform distribution for 
shuffles. These tests work as follows: assume we are given two sets of samples 
{£i}i=i,...,m and {r J -} i j=i,... ) n and would like to test the hypothesis that they 
are draws from the same distribution. To test, combine the draws and put the 
samples in increasing order (assume that all draws are distinct). This clearly 
gives a shuffle on symbols I and r. If the draws are from identical distributions 
then the induced distribution on shuffles will be uniform; if on the other hand 
symbols cluster together in the shuffle, there is some evidence that the draws 
are from unequal distributions. 

One can then test deviation from the uniform distribution on shuffles in one 
of several ways. One way is to count the number of "runs." As described in 
the introduction, a run is simply a sequence within the shuffle using only one 
symbol; the shuffle llrrrrl has three runs. Let X m _ n denote the number of 
runs under the uniform distribution on shuffles on m symbols of one type and 
n of a nother. The distribution of X OT ,„ is classical (see, e.g. iHogg and Craig 
( 199i ): 

P{X m ,„ = 2k + 1} 
P{X„ hn = 2k} 

Asymptotic results for the mean and variance are also known: 

E [X mn \ = n mn = 2 ■ h 1, Var [X m ,n\ = ; : ■ 

m + n m + n — 1 

The usual application of the runs test makes a shuffle from the two draws as 
described above, calculates the number of runs in the shuffle, and then uses the 
above-calculated probabilities to test deviation from the uniform distribution 
on shuffles. However, the same method can be applied in any situation to test 
deviation from the uniform distribution on shuffles. In the present case, we can 
use an analogous process to investigate tree shuffles. 

As described in the introduction, a tree shuffle simply assigns a shuffle of 
appropriate size to each internal node of the tree; from the previous section 
we expect these shuffles to be distributed uniformly for a variety of neutral 
models. Using runs we can test whether a single shuffle is drawn from the 
uniform distribution, but some method is needed to combine this information 
across the internal nodes of the tree. 

We chose to combine our data from each vertex by simply summing the 
number of runs across all of the shuffles in the corresponding tree shuffle. Let 
7Z(T) denote this number. The distribution of 1Z(T) (under the assumption 
that each shuffle is equally likely) can be calculated recursively as shown in 
the next several paragraphs. There are two cases to consider. First, one may 
condition on the observed tree topology and calculate the neutral distribution 



/m-\-n\ 
V m ) 



cy (m — l\ /n — l\ 
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of 1Z(T) in that setting. A second option is to test deviation from a neutral 
model which gives the uniform distribution on ranked trees. This is a stronger 
statement than saying that a given model induces the uniform distribution on 
shuffles conditioned on the phylogenetic tree. 

We first condition on the observed tree. Uniform shuffles conditioned on the 
tree shape are obtained in the CRP and the CAL class of models. For a tree 
with one leaf, we have ¥{TZ(T) = 0} = 1. For a tree with two leaves, we also 
have ¥{TZ(T) = 0} = 1 (the two daughter subtrees have no internal nodes). 

For a tree T with uniform random ranking, composed of two ranked subtrees 
L and R of size m and n, respectively, we have: 

k k—i 

P{K(T) = k} = J2 nx m ,n = *} ^ H^L) = j}P{7e(i?) =k-i- j}. (2) 

i=0 3=0 

It is shown in the Appendix that this distribution can be calculated on a tree 
with n leaves in time 0(n 3 log 2 n). Thus it is practical to obtain a p-value for 
K{T) analytically. 

Now we take the second approach, assuming we want to test a model such 
that each ranked tree is equally likely. This includes the CAL models, and 
in the case of pure birth models, this is exactly the set of the CAL models 
(Proposition Q3J). Let lZ(n) be the random variable "runs of a tree with n 
leaves" where the tree is drawn from the uniform distribution on ranked trees. 
The distribution of lZ(n) can again be obtained recursively. Note that for a 
uniform ranked tree on n leaves, the probability that one daughter tree has size 
t and the other daughter tree has size n — r is l/(n — 1) for all r. Thus 

n— 1 k 

F{K(n) = k] = Y VP{A%,„-r = *}x 

7i — i ^-^ ^— i 

(3) 

P{^(r) = j}P{ll(n - r ) = k-i- j}. 

3=0 

The complexity for recursively calculating the distribution of runs for trees with 
n leaves is 0(n 4 log 2 n), by an argument analogous to that for Equation ([2}. 

Note that there are a number of alternative ways to "sums of runs" for testing 
deviation from the uniform distribution on shuffles. First, we have made one 
choice — namely, summation — concerning how the statistics for each shuffle are 
combined. One certainly could use an alternative method, potentially including 
weights. Second, there are other statistics such as Mann- Whitney- Wilcoxon 
which could be used in place of the runs statistic. The advantage of summation 
is that it results in simple formulas, and the advantage of the runs statistic is 
that it is easy to interpret. We have not tested any alternate formulations. 

A python package for computing quantiles of shuffles is available at 

\protect\vrule widthOpt\protect\href {http : //www-m9 . ma . turn . de/twiki/bin/view/ Allgemeines/Tan 
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One of the main features of this package is the ability to calculate the quan- 
tile of the runs statistic assuming a uniform distribution on rankings for a set 
of input trees. The quantiles can be calculated conditioned on a given tree 
shape, or under the assumption of a uniform distribution on ranked trees. For 
a collection of trees (e.g. a sample from the Bayesian posterior), the individual 
quantiles can be averaged. In addition to the calculation of the runs statistic 
and the quantile for the whole tree, the package can calculate the runs statistic 
and quantile for each interior vertex of a single tree. This feature may be useful 
for biologists looking for signals of a key innovation. 

Shuffles in the Bayesian setting 

In our work up to now, we have assumed that the correct tree and diver- 
sification timing information is known. This assumption is not realistic for a 
number of datasets. For example, below we apply our methodology to a sample 
of Hepatitis C viruses, which probably do not have enough sequence divergence 
to perfectly reconstruct a phylogenetic tree with timing information. 

One way of working with such datasets is to take a Bayesian approach, 
where rather than a single tree one gets a posterior distribution on trees. For 
each single tree, one can compute the quantile of the total runs statistic, either 
conditioning on the topology or assuming a Yule distribution of ranked tree 
shapes. We then simply take the average of the quantiles thus computed for 
each tree. Such averagin g can be justified in a manner similar to the work of 
Drummond and Suchardl ( 2007 ). except that no further simulation is needed 



to compute the p-value. The average of p-values in this case is not exactly 
uniformly distributed under the neutral model as a proper p-value should be, 
although the ave raged distrib ution does share many of the characteristics of a 
classical p- value ( Mend . 1994 ). 



Runs and neutrality 

Here we note that the runs statistic can be used to test the coalescent in the 
presence of ancestral population size variation. Tests of neutrality in the pres- 
ence of historical population size variation are of particular recent importance, 
as new coalescent-based methods are in use to i nfer population size histor y in 
a Bayesian framework (jDrummond et all l2005t lOpgen-Rhein et al. , 2005)'. If 



these methods are to be used on a given set of sequences it is important to 
test the central assumption of the methods, namely that the sequences have a 
genealogy which can be accurately described using the coalescent with arbitrary 
population size history. 

Unfortunate l y, cla ssical statistics such as the D statistics of Taiima ( 1989h 



and iFu and Lil (|1993f ) confound ancestral population size changes and non- 
neutral evolution. One solution to this problem is to investigate the Bayesian 
posterior on phylogenetic trees for evidence of non-neutral evo lution rather than 



using the sequence information directly. This has been done bv lDrummond and Suchard 



(|2007f ). who use a posterior predictive p- value approach. Here we simply point 
out that, as described above, the coalescent with arbitrary population size his- 
tory is a CAL model and thus will induce the uniform distribution on ranked 
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phylogenetic trees; thus by rejecting the CAL class we reject a general coalescent 
model. We will apply this fact below in the example application to Hepatitis C 
data. 

Generalization for non-binary trees 

Polytomies (i.e. non-binary splits) are common in reconstructed phyloge- 
netic trees. Some polytomies are certainly due to a lack of information to 
resolve the splits, however it has been argued that molecular and species level 



polytomies actually exist (packman et al.l . Il999t iSlowinskil . 120011 ). The method- 



ology described in this paper can be extended to trees with "hard" polytomies, 
i.e. cases of multiple divergence which are essentially simultaneous in evolution- 
ary time. 

The new ingredient needed is the "multiple runs distribution," i.e. the 
analog of (HI) for shuffle s on more than two symbols. This is described in 



David and Barton! (|l962l) . Using these distributions, the probability of a shuffle 



consisting of symbols from the k daughter trees can be found for a shuffle at a 
non-bifurcating split v. 

Example applications 

In this section we describe two distinct applications of the methods in this 
paper. First, we apply the methods to El gene data for the Hepatitis C virus 
(HCV). This data set shows some limited- though consistent- lineage-specific 
bursting diversification, showing that neither CAL nor CRP models accurately 
describe the sort of evolution observed. However, an analysis not conditioning 
on tree shape clearly rejects any CAL model, such as the coalescent with varying 
population size. The second application is to phylogenetic trees for ants, whose 



timin g information was reconstructed through fossils and the r8s (jSanderson , 



l2003h rates smoothing program. These ant trees do not show any evidence of 
lineage-specific bursting evolution, despite some interesting history in terms of 
diversification rates. 



Our HCV data comes from two indepe ndent studies: one in China (|Lu et al 



and one in Egypt (|Rav et all 120001) . The HCV alignments were retrieved 



from the LANL HCV database (|Kuiken et ail . l2005f) via PubMed article ID 



numbers. The Chinese dataset contained samples from 132 infected individu- 
als, and the Egyptian dataset had samples from 71 individuals. We randomly 
partitioned the taxa from the Chinese dataset into three sets of 44 taxa each 
and used the corresponding sub- alignments as distinct data sets. The Egyptian 
data was similarly split into two sub- alignments of size 37 and 36. This par- 
titioning was done in order to have a larger number of similar datasets from 
which we could investigate the dynamics of HCV evolution, and to demonstrate 
that non-neutral evolution can be seen even with an moderate number of taxa. 
In order to avoid confounding temporal i nformation with molecular rate vari- 



ation, we applied the relaxed clock model of lDrummond et alJ (I2006b[) as imple 



ment ed in the BEASTvl.4 suite of computer programs (jDrummond and Rambaut 



20071) . We chose uncorrelated lognormally distributed local clocks, the HKY 
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Data set 



Const, cond. Exp. cond. 



Const. CAL Exp. CAL 



China set 1 
China set 2 
China set 3 
Egypt set 1 
Egypt set 2 



0.232 0.254 

0.191 0.17 

0.239 0.259 

0.261 0.299 

0.308 0.256 



0.0415 0.0601 

0.041 0.0349 

0.0287 0.0265 

0.045 0.0624 

0.0242 0.0188 



Table 1: Expected quantiles of the number of runs in the posterior for a Bayesian 
analysis as described in the text. Each row represents one dataset. "Const." 
means the BEAST analysis with a constant population coalescent prior, and 
"Exp." denotes analysis an exponentially increasing population size coalescent 
prior. The "cond." label means that we analyze conditional on tree topology, 
which gives us thus the quantile for any neutral model inducing the uniform 
distribution on shuffles. "CAL" denotes the runs quantile under the assump- 
tion of the uniform distribution on ranked trees, as would be the case for any 
CAL model, such as the coalescent with arbitrary population size history. As de- 
scribed in the text, the "cond." columns show that some limited lineage-specific 
bursting is seen, and the CAL column rejects the coalescent with arbitrary pop- 
ulation size history. 

model, and four categories of gamma rate parameters in the gamma + invariant 
sites model of sequence evolution. We used both the constant population size 
and exponential growth coalescent priors. All other parameters were left as de- 
fault; the corresponding BEAST XML input files are available from the authors 
upon request. 

In each case the MCMC chain was run for 10 million generations, and con- 
vergence to stationarity checked with the BEAST program Tracer. For each 
model parameter, the minimum effective sample size (ESS) was at least 164, 
with most being significantly greater. The coefficient of variation of the relaxed 
clocks in the analysis had a minimum of 0.336 and an average of 0.491, indicat- 
ing a significant deviation from a strict clock for this data set. The first 10% 
of the run was removed and 100 trees were taken from the tree log file, equally 
spaced along the run of the MCMC chain. We interpret these trees as being 
independent samples from the posterior. As a check, the analysis was run with 
an empty alignment and no consistent deviation from the uniform distribution 
on shuffles was detected (results not shown). 

We have displayed the results in Table 1. In the columns labeled "cond." 
we show the quantile of the number of runs conditioning on tree shape, calcu- 
lated as in Equation ([2]). As can be seen, the results are substantially below 
one half, with the maximum being 0.308. Although this is not exceptionally 
strong lineage-specific bursting behavior, it does so consistently across five sam- 
ples from two independent studies. Thus we feel confident in saying that the 
evolution of HCV displays lineage-specific bursting behavior. It might also be 
noted that these results were gained despite the fact that the coalescent was 
used as a prior. That is, if any bias could be expected in the Bayesian analysis, 
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Figure 3: A visualization of the number of runs in a posterior sam ple of trees 
for an alignment of Hepatitis C sequence data of Rav et al. ( 2000l ). Black bars 



represent fewer runs than neutral, and gray the opposite. As described in the 
text, the width of the bars represents the amount of divergence from a broad 
class of neutral models. Said simply, each black bar represents a tree in the 
posterior which displays evidence of lineage-specific bursting diversification, and 
the longer the bar, the more bursting the tree. The bars are sorted vertically 
by increasing size (with sign.) Figure (a) shows the results when the tree prior 
in BEAST was taken to be coalescent with constant population size. Figure (b) 
shows the corresponding results with the exponentially increasing population 
size prior. 



it would be towards a coalescent prior and a uniform distribution on shuffles, 
thus we believe our results form an upper bound for the actual statistics of the 
HCV lineages. 

We have displayed a graphical representation of the results for the second 
Chinese data set in Figure 3. Each horizontal bar represents one of the 100 
ranked trees from the posterior. One side of the bar gives the number of runs 
in the ranked tree T, and the other side gives the expected number of runs for 
a neutral (i.e. CAL or CRP) tree of the same unranked topology as T. If T has 
more runs than the expectation, the bar is colored gray; if fewer it is colored 
black. In both the cases of constant population size and exponentially increasing 
population size coalescent prior for BEAST, it can be seen that there are fewer 
runs than the expectation, meaning that it appears that the HCV data under 
investigation may have had periodic bursts of diversification in its past. 

Now we apply our techniques as a statistical test for the coalescent with 
ancestr al population size variation as described abo ve. This is topical: we note 
that the lRav et all (|2000l ) HCV data was analyzed bv lQpgen-Rhein et al.l (|2005h 
as an example application of a reversible-jump Bayesian MCMC algorithm for 
estimating demographic history of the virus. In doing so they made an im- 
plicit assumption of neu trality because their method [and other such methods 
( Drummond et al. . 20051 )] are based on the coalescent. They did not test this 
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neutrality assumption as no methods were available at the time to test for neu- 
tral evolution in the presence of ancestral population size changes. 

Our method can do so. Specifically, we compare the number of runs to the 
distribution for an arbitrary CAL model, as in Equation By the results in 
the right half of Table 1, one can see that the data does not follow a coalescent 
model with arbitrary population size history. Th is implies a significant model 
mis-specification in the lQpgen-Rhein et al. I (|2005f) paper; it would be interesting 
to know how this would impact the historical population size estimates in their 
paper. 

For the second app lication we investigat ed two different trees of ant taxa. 
The first tree is that of Moreau et al. (2006), showing the diversification of the 
major ant lineages. The timing information in this tree is quite remarkable, 
in that the corresponding lineages-through-time (LTT) plot shows a substan- 
tial increase in diversification rate during the Late Cretaceous to Early Eocene, 
which corresponds to the rise of angiosperms (flowering plants) . Given the tools 
at our disposal, one might wonder if this increase in diversification rates affected 
all lineages equally, or if it occurred in lineage-specific bursts. The second ant 
tree we investigated was that of Pheidole, a "hyperdiverse" ant genus. Phei- 
dole is almost certainly monophyletic, and yet co mprises about 9.5% of the ant 



in review). Moreau 



species in the world, according to latest estimates ([Moreaul . 
has recently reconstructed a phylogeny of this genus which we have analyzed 
along with the tree of the ant lineages in general. Both trees were reconstructed 
via maximum likelihood, then made ultrame tric using the pen alized likelihood 
method of the r8s rates smoothing program ( Sanderson . 2003h . 
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Figure 4: The distribution of the runs statistic for the internal nodes in two 
trees of ant taxa. Each point in each plot represents an internal node in the 
corresponding tree; the x axis gives the number of taxa below the internal node 
and the y axis gives the quantile of that interna l node in terms of the runs 
statistic. Figure (a) is the tree of Moreau et al. ( 2006f l. and Figure (b) is a 
tree of Pheidole. These two trees do not appear to consistently show either 
lineage-specific bursting or refractory diversification. 
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In Figure 4 we show a plot of the internal nodes of each tree. The x co- 
ordinate in the plot is the number of taxa below an internal node, and the y 
axis is the quantile of the number of runs in the shuffle statistic. As can be 
seen, there is no clear correlation between number of taxa below an internal 
node and the shuffle statistic, and at no stage does diversification appear to 
be consistently bursting or refractory in a lineage-specific sense. We can also 
compute the quantile of the total number of runs across the tree: for tree (a) 
this is about 0.9052 and for tree (b) this is about 0.6718. Thus for these two 
ant trees we do not see any significant evidence of lineage-specific bursting or 
refractory diversification. This analysis forms an interesting counterpoint to the 
LTT results for the ants, which shows an overall increase in diversification in 
rate during the Late Cretaceous to Early Eocene across the entire tree. 

Conclusion 

We have developed a framework which allows testing for non-neutral diver- 
sification timing. Our work consists of three main components: first, a simple, 
recursive way of quantifying the relative timing information on a phylogenetic 
tree; second, two classes of neutral models on trees with timing information; 
and third, a summary statistic which allows comparison of reconstructed trees 
to these neutral models. In our methodology, timing information is considered 
relative to sister taxa and considered in the context of the tree, which may 
make it a valuable complimentary method to lineages-through-time plots. We 
compute the significance of the deviation of timing information from a neutral 
model analytically, using a simple method drawn from classical statistics. 

This method was conceived for the macroevolutionary case, in order to find 
historical evolutionary patterns requiring explanation. However, it is also quite 
applicable in the microevolutionary case, where it can test neutrality in the 
presence of historical population size variation. This is particularly relevant as 
methods are becoming available to describe historical population size under a 
coalescent assumption. 

We emphasize that our methodology can go substantially beyond testing 
for deviation from the constant rate birth and death models, which are usually 
the entire class of "neutral" models considered. Indeed, because any CAL or 
CRP model induces the uniform distribution on shuffles, deviation from this 
distribution is evidence to reject any model in the CAL or CRP classes. Such 
a conclusion is much stronger than deviation from a constant rate birth and 
death model, which is only one of the CAL models. 

However, sometimes one may wish to test only a more restricted set of mod- 
els, such as only the CAL models (which include the coalescent with arbitrary 
population size history) and not the more general CRP models. By testing a 
more restricted class of models, a particular dataset will be more likely to fall 
outside the chosen class. For example, in the application of our methods to 
Hepatitis C data above, the data consistently shows evidence of not coming 
from a CRP model, although the corresponding quantile is in the 0.17 to 0.31 
range. However, if one tests for conformity to the CAL class (again, including 
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the coalescent with arbitrary population size history) one obtains rejection at 
the 5% level. 

We applied the methods on two types of data. First we investigated data 
for the Hepatitis C virus, demonstrating that it consistently shows evidence of 
some limited lineage-specific bursting evolution. We then applied our methods 
to reject the coalescent with arbitrary population size history for this data. 
Next, we investigated two phylogenetic trees of ants, which showed no evidence 
of unusual relative diversification rates, despite an interesting history of overall 
diversification rates. 

We recall that our method uses "relative" timing information rather than 
actual branch lengths. In many ways this is an advantage. In a microevolu- 
tionary setting this means that the corresponding tests are invariant to changes 
in ancestral population size, and thus our test for neutrality is not "fooled" by 
ancestral population size variation. In a macroevolutionary setting the statis- 
tics are robust to branch length estim ation err o r ove r long time scales. Such 
estimations are known to be difficult (|Kimural . Il98lh . We note further that 
from a modeling perspective it is possible to specify a probability distribution 
on ranked phylogenetic trees without specifying a particular distribution on 
branch lengths. This flexibility means that it may be possible to reject many 
models at once as described above. 

Nevertheless it may be useful at some future stage to combine topology 
and continuous branch length information, rather than the discretized version 
considered here. However, quantifying the shape of such objects appears to 



be challenging, as t he rel evant geometry is quite intricate ( jBillera et all 12001 



Moulton and Steel . 2004 ). In contrast, by discretization to ranked trees we 



obtain a purely combinatorial object. 

We close by noting that although various techniques for reconstructing phy- 
logenetic trees with timing information have been present for many years, these 
methods are currently seeing an intense period of development and will only 
improve. With this improvement we expect to see an increase in the number of 
trees present in the literature with interesting patterns of diversification timing 
due to adaptive radiation or other factors. We hope that our technique will 
prove to be a useful analytical tool for these future investigations, not only for 
finding interesting diversification patterns, but also for testing potential biases 
of timing reconstruction methods. 
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Appendix 

Here we provide a proof of the time-complexity bound for the computation 
of the runs distribution 1Z(T) (i.e. conditioning on a given tree shape). This 
distribution may be computed easily for certain tree shapes, such as the comb 
tree. However, here we provide a bound which holds for all tree shapes. This 
bound makes use of a bound on the number of runs in a ranked tree. 

Let r(n) denote the maximum number of runs for a ranked tree with n leaves. 
Thus r(l) = r(2) = 0, r(3) = 1 and r(4) = 2. Let 7 i= „ /2 be 1 if i = n/2 and 
otherwise. For a tree with at least 2 leaves, if the first branch point has i leaves 
on one side and n — i leaves on the other, with i < n — i, then the number of 
runs at this vertex may be up to 2(i — 1) + 1 — I i=n /2 (note that we have an 
(i — 1, n — i — 1) shuffle at this vertex). This maximum is obtained by a shuffle 
which interleaves the elements from each set, one from each side for as long as 
possible, starting with the largest side. 

Thus, r(n) satisfies the following recurrence: r(l) = r(2) = and for n > 2: 

r(n) = max (2i - 1 - I l=n / 2 + r(i) + r(n - i)) 

\<i<n/2 X ' ' 

Proposition 16. For all integers n>l, r(n) < nlog 2 n. 

Proof. The statement is true for n = 1. Suppose that the statement is true for 
all k < n. Then, 

1 - h=n/2 + r{i) + r(n - i)) 
1 + i log 2 i + (n - i) log 2 (n - i)) . 

Note that 2i — 1, i log 2 i and (n — i) log 2 (n — i) are all convex functions of i so their 
sum is convex also. Thus, the maximum of 2i — 1 + i log 2 i + (n — i) log 2 (n — i) 
occurs at an extreme value. Setting i = 1 gives 1 + + (n — 1) log 2 (n — 1), while 
setting i = § gives 2| - 1 + 2| log 2 | = n(log 2 2 + log 2 |) - 1 = nlog 2 n — 1. 
Both of these values arc less than nlog 2 n and so r(n) must be at most nlog 2 n. 
The result follows for all n > 1 by induction. □ 

We now proceed to bound the complexity of computing the distribution of 
runs for a tree. For a tree T with 1 or 2 leaves, the number of runs is always 0. 

Let T be a tree with n > 3 leaves; we assume a uniform distribution on tree 
shuffles. Let L and R be the two randomly ranked subtrees of T, with a and b 
leaves respectively. 



r(n) = max (2i — 

l<i<n/2 V 

< max (2i — 

l<i<n/2 
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Equation @ may be rewritten as follows: 

nnT) = k} = J2 p i x ^ = *} £ P ^ L ) = ft p i n ( R ) = k - % - ft 

»=0 j=0 
Ai 

= ^F{X a , b = i}F{K(L)+1l(R) = k-i} (4) 
1=1 

where A\ = min(fc,n) and A 2 = min(fc — i,r(a)). Note that a + b = n > 3 
implies X afi > 1 and 1Z(T) > 1. 

Since 1Z(T) is supported on (i.e. zero outside of) k = 1, ["-log 2 n\, and 
for each k, the computation of Equation (0| costs 2n — 1 operations, the cost 
of computing its distribution with this formula is at most (|jilog 2 n\)(2n — 1) 
arithmetic operations plus the cost of computing P{X a .b = i} for i — 1, .. . ,n 
and ¥{TZ(L) + K(R) = x} for x = 0, . . . , r(n) - 1 < n log 2 n - 1. 

For these fixed a and 6, the values of P{X aj 6 = i} can be calculated using 
Equation |T]) in constant time (at most 5*2 + 4 = 14 arithmetic operations each) 
with a linear overhead as follows. The binomial coefficients m for a < b and 
A; < b in Equation |T]) may be calculated with at most two arithmetic operations 
from the factorials, j\ for 1 < j < n, which may in turn be pre-calculated in 
linear time (n— 1 multiplications). Thus, calculating P{X aj b = i} for i = 1, . . . , n 
takes at most 14n arithmetic operations, with a one-time overhead of n — 1. 

The distribution of P{1Z(L)+1Z(R) = x} is supported on x = 0, . . . , [n log 2 nj — 
1. It may be computed by repeated application of the formula 

L(n-l) log 2 (n-l)J 

F{K(L) + K(R) = x} = ^ V{K(L)=j}V{K(R)=x-j} 

j=o 

as long as the distributions of 1Z(L) and 1Z(R) arc know. This computation re- 
quires at most nlog 2 n (2(n — 1) log 2 (n — 1) + 1) arithmetic operations: at most 
(n — 1) log 2 (rt — 1) + 1 multiplications and (n ~ 1) log 2 (n — 1) additions for each 
of nlog 2 n values of x. Note that the distribution of F{1Z(L)} is supported by 
j = 0, . . . , [("■ — 1) log 2 (n — 1)J , since L has at most n—1 leaves. 

So, if the distribution of 1Z(L) and 1Z(R) are known, the distribution of 1Z(T) 
may be calculated in at most 

(|nlog 2 n\)(2n - 1) + 14n + nlog 2 n (2(n - l)log 2 (n - 1) + 1) 

arithmetic operations. This is at most 

2n 2 log 2 n + 2n 2 log 2 n + I4n 

for all n > 3. Since 1Z(T) is for n = 1, 2 the time to calculate it is 0. 

This procedure may be applied recursively, computing the distribution of 
runs of all subtrees before finally computing the run distribution of T, Since 
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there are n— 1 internal vertices and each has at most n leaves below it, the total 
number of arithmetic operations required is at most n(2n 2 log 2 n + 2n 2 log 2 n + 
14ra + 1) (including the overhead for pre-computing This is 0(n 3 log 2 , n). 
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